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Abstract 

Complex socio-economic networks such as transportation networks, in- 
formation systems and even underground organizations are often designed 
for resilience - to be able to function even if some of the nodes (e.g. hubs, 
routers, operatives etc ) are compromised by a human or natural adver- 
sary. In many cases the adversary threatens to cause a cascade where the 
failure of one node leads to some of the adjacent nodes being lost as well, 
and then many more nodes lost in a far-reaching domino effect. Such 
cascades motivate the search for mechanisms and network designs that 
would increase network cascade resilience. This work introduces a math- 
ematical model in which such networks are viewed as the solution to an 
optimization problem that trades off cascade resilience against efficiency, 
and describes the optimal networks under this model. The results show 
that of the network designs considered, a network consisting of multiple 
star-like cells is optimal. Also, perhaps surprisingly it was found that in 
many network designs and parameter values, edge density of the optimal 
network topology does not monotonically decrease when cascade risk in- 
creases, but may increase again when cascade risk is high. This implies 
that certain networks ought not to be modified for cascade resilience, since 
the cost in efficiency is too high. Understanding cascade resilience and its 
structural phase transitions will ultimately help identify vulnerabilities 
and design more durable networks in many diverse application areas. 

Keywords: networks, resilience, cascade, contagion, epidemics on networks, ter- 
rorism, terrorist networks 

1 Introduction 

Contagions on networks have been a major theme in the field of network re- 
search. For example, in a power grid the loss of a single transmission or genera- 
tion node may cause nearby nodes to be overloaded, becoming disconnected or 
damaged. This failures might propagate widely through the network, leading 
to widespread blackouts and large damage to the economy. Rebuilding the net- 
work might involve both repair of hardware and complex start-up procedures 
that would take considerable time. These cascade phenomena are also found 
in other domains such as disease control, computer networks, financial markets 
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and social systems. A particularly interesting example is clandestine social net- 
works, such as terrorist networks or guerrillas operating in hostile environment. 
If one of the nodes (i.e. operatives) is captured by law enforcement agencies, it 
may betray all the nodes connected to it leading to their probable capture. 

The focus of this work is to investigate a novel problem in network research 
- how to build cascade-resilient networks. To date much research has explored 
the extent of cascades and the nature of their propagation, specifically looking 
at important classes of networks (6] |9| [TTJ [T8j |30]. Here the focus is different 
in that the topology is not fixed because in many networks cascade resilience is 
a design criterion. By "resilience" it is meant the ability to reduce the extent 
of contagions. The objective here is to identify topological features that can 
endow networks with high contagion resilience. It is hoped that ultimately it 
would be possible to identify a general prescription for building cascade-resilient 
networks for many different domains. Finding such features would be useful not 
only in networks that are being designed de novo, but also in a much broader 
class of networks where some changes could be made to the topology even if 
complete redesign is infeasible. This include power grids (which could be locally 
upgraded), social networks facing epidemics (through isolation of certain nodes) 
and others. 

Because the relative importance of efficiency and resilience depends on op- 
erating conditions, the optimal design is expected to be not a single pattern, 
but multiple different patterns, with possible sharp transitions between them. 
In research of terrorism and guerrilla movements, a classic pattern is the tree- 
of-cliques cellular hierarchy (FigjTJ) |24l H9] (see also research on crime networks 
in industry [3], and drugs (25]). However, it is clear that both in terrorism 
and in other domains the optimal topology depends on conditions such as the 
risk of cascades and the purpose of the network. Indeed, while the cellular tree 
design is very secure, its tree-like structure is very vulnerable to becoming dis- 
connected. Moreover, the near-tree structure means that messages propagating 
on the network would take many hops to reach their destination, which limits 
this pattern's operational efficiency. 

The model introduced in this paper addresses this problem in a simplified 
context of graph theory, as follows. It is clear that in many networks it is 
possible to increase cascade resilience by many means, not only through their 
topology. For instance, in controlling respiratory diseases, it is possible to ask 
people to wear face mask reducing the spread of contagions. Also, in clandestine 
social networks one sees additional types of nodes ("dead drops" or "couriers") 
whose purpose is to prevent cascades. Those types of defenses are interesting 
both in practice and in theory, but they involve heterogeneous graphs (with 
multiple types of nodes and edges) whose models are both more complex and 
application-specific. Therefore here the focus is on simple graphs as models of 
networks in the view that the conclusions derived from such models would also 
be applicable to more complex situations. In the remainder, the words "network" 
and "graph" G will be used to mean the same object: a tuple (F, E), where V is 
a set of "nodes" and E is a set of "edges", where each edge is an unordered pair 
of nodes. 
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(a) FTP group 



(b) FTP battalion 



Figure 1: The basic organizational unit of a French World- War II underground 
network, Francs-tireurs Partisans (FTP), was the combat group (a). This was divided 
into two "teams" of three fighters, where leader LI was in overall command and in 
command of team 1. His lieutenant, L2, led team 2 and assumed overall command 
if LI was captured. The small degree of the nodes ensured that the capture of any 
one node did not cause a significant fraction of the organization becoming captured as 
well. Such groups were organized into a hierarchy (b) where 3 groups made a "section", 
3 of which made a "company", and finally 3 companies made a "battalion" |24] , In the 
battalion figure, a leaf node corresponds to the leader of a group (subordinates not 
shown). 

Even on simple graphs, designing resilience to cascades is not a simple prob- 
lem to formulate. As was suggested by the tree-of-cliques, in practical net- 
work designs it is necessary to balance resilience with suitably-defined perfor- 
mance/efficiency. Indeed, intuitively the most cascade- resilient network is the 
network with no edges (no cascades can propagate) , but it is also the least useful 
kind of network. Therefore, searching for the most resilient design is not the 
right objective. Rather the true objective is to maximize a certain combination 
of resilience and efficiency, which is termed "fitness". It is expected that typi- 
cally maximizing resilience and maximizing efficiency will be in opposition. Just 
as disconnected networks are resilient and inefficient, highly-efficient networks 
such as complete graphs are likely to have low resilience |34] . 

Such a trade-off suggests formulating the question us one of mixed-objective 
optimization, where the solution space is a space of simple graphs possibly with a 
fixed number nodes. In this formulation one must overcome two keys problems. 
First, this is a very large search space even if the number of nodes and edges 
in the graph is as small as a hundred. Secondly, it is important to be able to 
smoothly measure efficiency in both connected and disconnected topologies but 
many familiar graph functions are only suitable for only connected topologies. 
Both of those issues are addressed here: to reduce the search space the graphs are 
constructed using parametrized generating programs, and efficiency is measured 
by a metric termed "distance-attenuated reach". 
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Research on sociological networks indicates that resilience and efficiency 
might be just two of several design criteria that also include e.g. "information- 
processing requirements", that impose additional constraints on network designs 
|3]. In the original context "information-processing" refers to the need to have 
ties between individuals involved in a particular task, when the task has high 
complexity. Each individual might have a unique set of expertise into which all 
the other agents must tap directly. Generalizing from sociology, such "functional 
constraints" might considerably limit the flexibility in constructing resilient and 
efficient networks. For example, in the context of terrorism, this constraint sig- 
nificantly decreased the quality of attacks that could be successfully carried out 
in the post 9/11 security environment [33]. Such functional constraints could 
be addressed by looking at a narrow set of models of networks which already 
incorporate such constraints. The specific models to be examined are motivated 
by analytical expediency, but in general one may want to consider a particular 
design palette dictated by the application at hand. The current objective is to 
identify which elements might be useful for increasing the fitness of networks, 
and thus suggest elements that are useful to have in the palette, even if it is 
constrained. 

There is a very extensive literature on both cascades and resilience. For 
instance, a number of investigations considered resilience to removal of nodes 
or edges. An important result in this domain is that certain types of scale-free 
networks but not others are sensitive to targeted node removal [T] El [11] [16] 
[15] . Interesting research looked at different models of contagion [6] [10] , as well 
as non-topological mechanisms for increasing resilience |21 1 [26] . The general 
area of resilience has attracted a lot of research in the area of secret societies 
such as terrorist networks |13l [2] [T3] [25] 131]. In fact many secret societies 
are benign, including non-governmental organizations and dissident movements 
operating in hostile political environments. Related problems have also been 
studied in epidemiology, where the question focused on immunization strategies 
(e.g. [31]) but apparently not as a question of optimal network design. Game- 
theoretic methods have recently been applied to the resilience-efficiency trade-off 
in terrorist network design [23, [22] . 

The main contribution of this work is to systematically attack the problem 
of building cascade-resilient networks. The closest to the current work is the 
short report (29] (but it uses a very different model of cascades involving node 
capacities). Also novel is a metric for defining efficiency of networks, which 
enables studying both connected and disconnected topologies. 

The organization of this paper is as follows: section 2 formalizes the problem 
mathematically and section 3 introduces various classes of networks. The results 
are exhibited in section 4, and discussed in section 5. Mathematical details are 
found in the Appendix. 
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2 Formal Model 



The novel approach here is to represent the resilience-efficiency trade-off above 
as an optimization problem where the decision variable is the topology of a 
graph and the objective function is a metric combining resilience and efficiency 
termed "fitness", F(G). Namely, the problem is to find the graph G(V, E) chosen 
from a set G that solves 



max 

Gee, 



tR(G) + (1 - r)W(G) 

v v ' 



(1) 



where: 



R(G) — resilience to cascades 
W(G) — efficiency 
r e [0,1]. 

The set G is here called the "design" of the network. This design could be quite 
narrow (e.g. an Erdos-Renyi random graph on n nodes [28 ) or very broad 
(e.g. any n-node graph). Both of the functions R(G) and W(G) (W stands for 
"work") will have range C [0, 1], to make them independent of network size. The 
parameter r weighs resilience against efficiency, and depends on the problem 
being studied. Much of the discussion below will be about around point r = 0.5 
in which the two are weighed equally. On a practical level, r could be interpreted 
as the cost of restoring the network after a cascade - is it light or catastrophic. 

In order to define R(G) one needs to provide (a) a realistic model of conta- 
gions, and (b) a metric of contagion resilience relevant to that model, and then 
repeat steps (a)&(b) for W(G). In the next subsection two simple measures of 
R(G) and W(G) are introduced that should be applicable to a range of different 
scenarios. However, in many problems it is interesting to consider the general 
formulation (jlj above, even if the definitions of resilience and efficiency are dif- 
ferent. The issue of defining a suitable set of graphs G is more challenging due 
to the size of the search space, and is discussed in section [3j 



2.1 Measuring Resilience 

Research on graph theory has led to the development of a variety of metrics 
of robustness or resilience |19J . However, unlike in many other studies here 
the interest is in resilience to cascades not to e.g. disconnection. A natural 
definition of resilience is the expected size of the surviving network as a fraction 
of the whole, for a cascade that originates at a single node: 

R(G) = 1 — Mextent of a cascade] . (2) 

n — 1 

For simplicity, assume that cascades start at all nodes with uniform probability. 
It would be easy to extend this to cases where factors such as graph topology, 
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node degree and even node type (for heterogeneous graphs) play a role. Note 
that the definition considers the case where cascades spread beyond the imme- 
diate neighbors of the start node, since this possibility is important in practice. 

A simple model of a contagion is the following probabilistic model: each node 
in the graph can be in one of three states "susceptible", "infected" and "removed" 
designated S,I and R respectively (these names are from epidemiology). Time 
is assumed to move in uniform discrete steps. A node in S state at time t stays 
in this state, unless a neighbor "infects" the node, causing it to move to state / 
at time t + 1. Specifically, a node in state S at time t has a node-independent 
probability r of turning to / state at time t+1 if an adjacent node is at state / 
at time t. Finally, a node in I state at time t would always become R at time 
t+1. Once in state R, a node would remain there for all future times. In general 
the rate of transition I — > R could take more than one time step but adding this 
effect would mostly serve to increase the probability of transmission [27 , which 
is already parametrized by r. Indeed, in certain application the time-step model 
is very realistic, where reportedly groups train agents to hold information for a 
set time period, so that their contacts have time to hide. 



2.2 Measuring Efficiency 

The problem of measuring efficiency is even more involved than the problem 
of resilience because efficiency can only be computed by knowing the ultimate 
function of a network. Yet, the function could be very dependent on the net- 
work, even if the contagion structure is similar. Ideally, the efficiency metric 
would (a) be general enough for a variety problems, (b) be suitable for both 
connected, weakly-connected and disconnected networks and (c) computation- 
ally and analytically simple. It is clear that for many applications the distance 
between pairs of nodes in the network is the most important determinant of 
the network's efficiency. This idea motivates the following "distance-attenuated 
reach" metric of efficiency, which gives the average neighborhood size of each 
node, corrected by the distance to the nodes in this neighborhood. Namely, for 
all nodes u d V, weigh each v G V \ {u} by the inverse of its distance to u: 

where g > is a parameter and normalization by n(n — 1) ensures that < 
W(G) < 1. As usual, for any node v disconnected from u, set d ^ v )g = 0- An 
equivalent formula is the following: if V v> d is the set of nodes around u at dis- 
tance d from u (d goes from 1 to oo), then W(G) = ra( „ 1 _ 1) J2 u ev XXi 
Thus, if the network has short paths, then the metric would approach 1.0 be- 
cause Vud would be large for small d, and if it has long paths or is disconnected, 
then V UjC i would be small for small d. (One may even generalize this metric to 
replace d 9 with d f +D for some D > to take into account the possibility of 
connecting to nodes through a costly alternative route that bypasses the given 
network.) The parameter g could be termed "connectivity attenuation" of the 
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network because it represents the rate at which distance decreases the connec- 
tivity between nodes. In some problems (such as the Internet) the presence of v 
on the same connected component as u is completely sufficient for providing the 
services of v to u (such as serving documents), implying that g —> 0. In other 
problems (such as trust networks) one can trust only one's friends and much less 
their friends, corresponding to attenuation g ^> 1. Attenuation is expected to 
have a significant effect on the optimization problem because it is hard to build 
resilient networks when the attenuation is rapid because to decrease cascade 
risk one cannot reduce the density of edges, as such a reduction would radically 
reduce efficiency W(G). 

In general, the current modeling approach is interesting to contrast with the 
work of Lindelauf et al. on terrorist networks |23l [22] . Like here, they con- 
sider two optimization criteria: "secrecy" and "information", corresponding to 
resilience and efficiency. Their secrecy metrics are based on the idea that the 
capture of a member of secret group will lead to the loss of their immediate 
neighbors, with some probability. The issue of cascades is not considered, prob- 
ably for reasons of analytic tractability. The metric of information is defined by 
looking at the average shortest-path distance between pairs of nodes (and the 
network is required to be connected). Most different is the interesting appli- 
cation of game theory to find the optimal network. Whereas here the optimal 
network is the solution to an optimization problem, their optimal network is 
the Nash equilibrium in a bargaining game involving a "secrecy player" and an 
"information player". 

The above definitions of efficiency and resilience provide an intuition about 
the optimal design of a network. Since edges increase efficiency of the graph, as 
the probability of cascades decreases (r — ► 0), regardless of design, the optimal 
network would grow more dense in order to reduce the average distance on 
the network. However, since cascades propagate through edges, as r — > 1, the 
optimal networks would become more sparse in order to maximize resilience at 
the expense of efficiency. As would be shown in the results section, this intuition 
is often incorrect. 

3 Network Designs 

The optimization problem above faces the difficult obstacle of a large solution 
space. For general graphs on say n nodes, the set of possible solutions to the 
above problem has huge cardinality: (counting isomorphisms). Therefore, 
any practical approach must identify a relatively small subset of that huge space. 
Moreover, given a set of highly-rated networks (eg. top 1%), it may be chal- 
lenging to characterize them - to identify what features give those particular 
networks their desirable properties. 
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3.1 Solution Approach 

In order to solve those two problems, this work considers what is arguably the 
most important subset of the search space. Namely, the focus is on the set of 
all graphs on n nodes which can be constructed using a number of simple mod- 
els termed "network designs". Each model contains parameters which specify 
how the network is to be generated, where each setting of the parameters is 
termed "configuration". Thus, instead of searching through the space of graphs, 
the search is through the set of programs generating graphs from a model, or 
more concretely, the search is through the set of parameters that control those 
programs. The process is similar to evolution where selection occurs on the 
phenotype of organisms - here: graphs - but the organisms are largely speci- 
fied by their genotype - here: the designs and their configuration parameters. 
In other words, each design D has configurations Cf , , .... In experiments 
each configuration Cf is inputted to a program that generates a sample of net- 
works, whose average performance provides an estimate of the fitness of Cf (see 
Appendix, section [C] for details). 

As to the characterization problem, given that one would know how the 
networks were generated, one can more easily characterize the optimal designs 
simply by looking at the parameter values of the programs that generated them. 
Implicitly, this procedure assumes that for a given set of parameter values all of 
the models/programs produce similar networks, as far as efficiency and resilience 
are concerned. Hence, the words "network" and "configuration" will be used 
interchangeably, even though the former refers to a single graph, while the 
latter to a class graphs generated using the same process. 

Another advantage of this approach over brute-force enumeration is that a 
graph-generating program is an analog of instructions or protocols by which 
real networks are constructed. Practical limitations prevent those instructions 
from being complex, and hence the set of graphs constructed by such a program 
is the more relevant search space for practical applications, than the set of all 
graphs on n nodes, 2^ 2 \ 

3.2 Network Designs 

A variety of graph generating models have been proposed: Poisson [12], small 
world |32| . preferential attachment |3], and many others. However, the models 
recreate networks whose construction principles are quite different from what 
is called for in cascade-resilient networks. In particular, many models produce 
graphs with a relatively large number of high degree nodes - a feature that 
strongly promotes cascades (9] and under certain conditions even facilitates 
epidemics that sweep most of the nodes in the network [30]. While in some 
applications (such as scientific collaboration networks) cascades are desirable, 
the situation is reversed here. 

Fortunately, a good source of suitable candidates is found in studies of clan- 
destine networks. It is known that terrorist networks are often partitioned into 
cells which operate largely independently of each other. Moreover, the lead- 
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ership of the terrorist group is often not even in direct contact with the cells, 
instead providing strategic guidance and manuals through public forums like 
websites. Motivated by these findings, let us consider networks that consist of 
identical "cells" where each cell is either (a) a clique (a complete graph), (b) a star 
(with a central node called "leader") or (c) a cycle. Each of these have a single 
parameter, k - the number of nodes in the cell, in addition to n - the total num- 
ber of nodes. Let us also consider graphs consisting of (d) randomly-connected 
cliques, (e) randomly-connected stars, in both cases according to probability 
p (termed the "cavemen graph" and "connected stars" graph, respectively) as 
well as (f) the simpler Erdos-Renyi G(n,p) random graph with probability p 
(see Fig. [5]). Analytic expressions of the fitness for some of those designs are 
available (see Appendix, sec.[D|. 

Armed with those designs, the computation below will determine which of 
them is optimal (i.e. how to build the cascade- resilience network), as well as 
how the optimal design should be configured (e.g. how large should each cell 
be) . The networks produced by different configurations of a single design could 
be quite different from each other. For instance, the cellular designs could be 
configured to create networks of multiple disconnected component^] as well as 
the more extreme networks without any edges or with all nodes in the same 
component. Some of the designs generalize others, implying that they are ex- 
pected to perform at least as good: the connected stars design includes both 
the Erdos-Renyi design (by setting k = 1) as well as the stars design (by setting 
p = 0). Similarly, the cavemen graph includes both the Erdos-Renyi random 
graph (by setting k = 1) as well as the cliques graph (by setting p = 0), Note 
also that some designs have structural limitation so that possibly none of their 
configurations can achieve R(G) — 1 or W(G) = 1, respectively. For example, 
the stars design cannot achieve W(G) = 1 for any positive attenuation expo- 
nent. As we shall see, this will effect its optimal configuration for extreme values 
of the parameters. 

It is clear that the above palette of designs is far too short to provide im- 
mediate value to all of the application areas where cascade resilience is desired. 
The objective here is to propose an approach which could be applied to different 
domains, as well as begin constructing a theory to address cascade resilience. 
It should also be noted that the "optimal design" is likely to be a function of 
exogenous parameters such as cascade risk, r, weight of resilience, r, as well the 
application area: functional constraints which may make some of the designs 
unsuitable. 

A particularly important problem is understanding the structure of terror- 
ist networks. These networks are prototypical examples of networks that are 
maximized to resilience - their adversary is various government agencies, and ef- 

1 It has been argued that disconnected graphs are not realistic as models for many applica- 
tions, but several reasons suggest otherwise. First, a variety of networks are only connected 
in the topological sense, and in fact, a very small number of edges act as bridges between 
parts of the network. Second, in many networks certain edges are highly resilient to cascade 
propagation, or could be made to be. Thus the disconnected network provides a simplified 
model of networks containing regular and resilient edges. 
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(d) cavemen (e) connected stars (f) Erdos-Renyi 

Figure 2: Illustration of the 6 designs on 50-node networks. 

ficiency - to be able to recruit and to carry out terrorist attacks. Unfortunately, 
it is hard to obtain detailed data on their structure, with the notable exception 
of the 9/11 network [20] and some historic underground groups, such as FTP 
illustrated in the introduction. For the FTP network, a "battalion" (228 nodes, 
462 edges) was constructed based on the account in [Mj. Although both the 
FTP fighters and the 9/11 terrorists are secret societies, the author does not 
propose any moral equivalence between their objectives. 



4 Results 

This section addresses several questions. The first is to determine which of the 
designs is optimal, which may depend on the settings of the parameters. Recall 
that the model includes the parameters r, r and g representing contagion risk, 
the importance of resilience as opposed to efficiency, and the attenuation of 
the network, respectively. Second, within each design, what is the optimal 
configuration? 

4.1 Optimal Network 

In the first set of experiments, the connectivity was kept to at g — 1, and the 
contagion risk (r) was varied. In each setting, the optimal configuration of 
each design was determined. It can be seen that the connected stars network 
is optimal (Fig. [3j and except for the r < 0.1, the star network (without a 
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Figure 3: Fitness of the optimal configuration for various designs, and for the 9/11 
and FTP networks, r = 0.49, g = 1. 

connection between leaders) is almost as fit. The optimality of the star-based 
networks is due to a good trade-off between R and W: the central node in each 
cell (its "leader") provides a good firewall against cascades, because most pairs 
are separated by distance of 2 (unlike in cliques where separation is 1), but 
this separation reduces efficiency only modestly, unlike the much longer average 
distance in the cycle graph. 

An interesting qualitative observation is that within each design, as r in- 
creases, the fitness decreases - one cannot win when fighting cascades, only 
delay. This monotonicity could be proved in general (see Appendix, sec. |b|). 

Consider now the empirical networks: 9/11 and FTP. It is interesting that 
the 9/11 network is quite successful for low values of r (< 0.2), but then it 
rapidly deteriorates. This is due to a rapid increase in the extent of cascades 
- rapid decline in resilience. This onset of rapid decline suggests that in some 
types of networks, the network might be initially hard to defeat, but there is a 
point after which efforts against it start to pay off. If r is representative of the 
security environment, then one can say that the 9/11 is relatively ill-adapted 
to more rigorous security measures implemented after the attacks. Indeed, it is 
likely that the 9/11 attacks would have been thwarted under the current regime 
since one of the nodes was captured before 9/11. In contrast, the cellular tree 
hierarchy of the FTP network is more suitable for intermediate range of cascade 
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(a) r = 0.49 



(b) r = 0.51 



Figure 4: Resilience of the optimal design. 



risk r, but the average distances in it are too long to provide high efficiency. 
Therefore, its performance in very low and very high ranges of r is comparatively 
poor. 

In certain applications it is possible to invest in reducing the cascade prop- 
agation probability, r (e.g. using nomes de guerre in a secret society). Then 
the curves in Fig. [3] could also be viewed as expressing the value of efforts to 
reduce cascades by reducing r. If the slope is steep then the gains are large. It 
is important to remember that the fitness curves indicate the fitness of the opti- 
mal configuration of each design, rather than a static network. If, however, the 
configuration was made fixed then the fitness decrease would be even steeper as 
r increased since changing the configuration can mitigate some of the decrease. 

The fitness of an optimal network is a continuous function of the parameter 
r, and so the counterpart of Fig. [3] but for r = 0.51 is almost indistinguishable 
(see Appendix, sec. [Ajfor justification). In contrast, resilience, efficiency and the 
structure of the network may experience discontinuous "phase transitions" as r 
is changed. One such transition occurs at r = 0.5 (Fig. ij5 13 1. Below r < 0.5 
when cascade risk is high (r ^> 0) the optimal design maximizes efficiency, 
whereas for r > 0.5 it maximizes resilience. 

Intuition suggested that the networks grow more sparse as contagion risk 
grows. Instead, the results were surprising because the trend was non-monotonic 
(Fig. |6}. Unexpectedly, for r — ► 1 and r < 0.5 some network designs (e.g. 
cliques, cavemen, connected stars) became denser, instead of sparser, and for 
them the most sparse networks were formed in the intermediate values of r. For 
more details on configuration parameters see Appendix, sec. [E] For sensitivity 
analysis see Appendix, section [F] 



4.2 Effect of attenuation 

Various combinations of the parameters have interesting effects on the fitness, 
as shown on Fig. [7] Comparing fitness for attenuation g — 0.1 against g — 
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(a) r = 0.49 (b) r = 0.51 



Figure 5: Efficiency of the optimal design. 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 



(a) r = 0.49 (b) r = 0.51 

Figure 6: Average degree in the optimal configuration of each design. 

10 notice that decreasing g improves the fitness of the optimal configurations, 
as expected. Furthermore when g is small, it is easy to find highly-optimal 
configuration because the networks could be made sparse - improving cascade- 
resilience without significantly reducing efficiency. In contrast, when attenuation 
g is large, efficiency cannot be achieved and the optimal configuration of the stars 
design is to have cells of size 1 maximizing resilience. It is perhaps surprising 
that for smaller g, high fitness is most difficult to achieve when efficiency and 
resilience are approximately equally weighted (i.e. r is near 0.5), especially when 
r is near 1.0. 

The attenuation g has also interesting effects on the relative merits of various 
designs (Fig. [8}. For example, the cycle design is not competitive with star when 
g = 1.0 but when g = 0.1 the relatively large distance in cycles do not decrease 
efficiency very much, and they help stop cascades. 
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(a) g = 0.1 (b) g = 10 

Figure 7: Fitness of the optimal configuration in the stars design for various values 
of g, t and r. 

5 Discussion 

The success of the star design could be analyzed more qualitatively. The fit- 
ness function combines resilience R(G) which decreases when the graph be- 
comes more strongly connected, and efficiency W(G) which decreases when the 
graph becomes more sparse. The existence of a non-trivial solution is due to 
the different functional relationships. To a first-order approximation, efficiency 

decreases inversely with average distance (~ -jA- =• ) while cascade prop- 

J & v avg distance 9 ' v v 

agation probability decreases exponentially (~ r av & distance^ f Qr T < \ assum- 
ing a bounded number of alternative paths). For example, for the star design 
R = 1 — t 2 and W = 2~ 9 as n — k — > oo. Therefore, the optimal network's 
structure exploits the exponential decrease in cascades without sacrificing too 
much efficiency. In the range r € [0.2, 0.7] and r w 0.5, an average distance of 
m 2, as in the star graph, might be optimal. 

Community organization might play a role in fitness. Notice that the con- 
nected stars graph includes the random graph (by setting k = 1) however for 
most values of r, connected stars performs much better (similarly in cavemen vs. 
cliques) - why? Perhaps to achieve high performance it is helpful to build the 
graph around "communities" - sets of densely-connected nodes. Indeed the op- 
timal configuration away from r — > is precisely based on k > 1 (see Appendix, 
sec. |E}. The effect of community structure on cascades has been explored ex- 
tensively, but this is the first evidence that it could be done while maintaining 
efficiency. 

The finding that under high cascade risk the optimal network is dense is in- 
teresting, because our expectation was that the optimal network would be sparse 
and tree-like. Instead, it was found that at high r values the optimal networks 
have low objective values (Fig. [3} and are not optimized for resilience at all. 
This may have interesting parallels in a variety of application areas. Consider 
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Figure 8: Fitness of the optimal configuration for each design when g = 0.1. Data is 
for r = 0.49. 



for instance non- violent resistance movements, like the movement that brought 
about the independence of India and Pakistan or those involved in the recent 
reforms in Serbia, Ukraine and Georgia. The organizers of those movements 
intentionally chose to organize openly rather to form an underground. This 
openness greatly facilitated the movement's growth, although it put at risk the 
individuals who participated. The parallel in our model is to the sacrifice of 
resilience to cascades in order to gain higher efficiency. This work suggests 
that such a sacrifice is worth making even when cascade risk is high as long as 
efficiency is more valued or replacing nodes is easy (r < 0.5). 

6 Conclusions and Future Work 

This work has explored the problem of designing networks for cascade resilience. 
The main contributions are: 

• A general definition of the problem as a multi-objective optimization prob- 
lem 

• Metrics for efficiency and resilience that work well in various networks, 
including disconnected topologies 
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• Evidence for optimality of star-like topologies 

• Evidence for non-monotonicities in the edge density as a function of cas- 
cade risk 

• Evidence that the cellular hierarchical network, such as the FTP is suitable 
for intermediate ranges of risk, and that the 9/11 network would have been 
easily defeated under a more rigorous security environment. 

Much further work remains to be done in this area. An interesting area is of 
optimal design in heterogeneous rather than simple graphs. The former are 
important in practice and are potentially more involved. For example, there 
could be two or more classes of nodes, with different effects on efficiency and 
resilience. One of them could be "immune" to contagions. As well, the current 
contagion model could be usefully generalized to other models (multiplexed 
contagions [6], the SIS model or threshold contagions |28j). It should also be 
worthwhile to explore questions about dynamics, such as, how to grow networks 
while maintaining both their efficiency and cascade resilience. More theoreti- 
cally, the discussion of designs suggests consideration of Kolmogorov complexity 
as applied to networks. It is possible that the number of parameters in a designs 
constraints the optimality of the network. 
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A Continuity of Fitness 

It was claimed in sec. 14.1 I that fitness is continuous. Notice that the claim is not 
about the continuity of fitness of a single configuration as a function of r but 
rather that: 

Claim: f(r) = max^gc F(G, r) is continuous for r E [0, 1]. 

Proof: Consider an optimal configuration C\ of a design for r = r\ and 
let its fitness be fx — F{C\,r\). Observation 1: at r = r 2 the fitness of C\ is 
different by at most|r 2 — 7*1 1, explicitly: \F(Ci,r 2 ) — fi\ < \r 2 — (by linearity 
of Eq. [I] in R and W and because < R < 1 and < W < 1). Observation 
2: the optimal configuration for r = r 2 , C 2 , with fitness f 2 = F(C 2 ,r 2 ) must 
satisfy f 2 — f\ < \F(Ci,r 2 ) — fi\ < \r 2 — n|, otherwise it would contradict that 
fx is the optimal fitness for r = r% and/or contradict Observation 1. Similar 
arguments applied to C\ imply that fi — f 2 < 1 7^2 — ^1 1 - Continuity is then 
proved by taking the limit: lim r2 ^ ri \ fi — f(r 2 )\ < lim r2 ^ ri \r 2 — r x | = 0. 
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B Extent and Contagion Risk 



Proposition: Let 



f(r) 



max 



rR(G,T) + (l-r)W(G) 

S v ' 

F(G,r) 



be the fitness of the optimal graph G* for a fixed network design G, for cascade 
probability r. Then /(r) is a non-increasing function of r. 

Proof of Proposition: The proof relies on a simple claim that resilience of 
networks does not increase when r increases. Namely: 

Claim: VG, a simple graph, if r + > r then R(G,t) > R(G,t+), that is, increas- 
ing cascade probability does not increase resilience. 
Take the claim as given, and assume by contradiction that r + > r and 



f(r+) > f(r) . 



(4) 



Let G* + , G* by any two optimal networks for r + and t, respectively, namely: 



G; + S argmax GeG [rR{G lT+ ) + (1 - r)W{G)] and G* G argmax GeG [rR(G,r) + (1 - r)W(G)]. 
By optimality of G*, get that 

F(G*,t)-F(G* ,t)>0. 



(5) 



= A 



Expanding A: 



ri?(G; + ,r) + (l 

< F(G;,r)-ri?(G; + ,r + )-(l 
= F(G*,t) - F(G* + ,t + ) < by the assumption Q 



r)W(G; + ) 

r)VF(G; + ) by the Claim. 



This implies that F(G*,t) — F(G* + ,r) < and therefore contradicts that G* 
is an optimal network for r (Eq. |5j . 

Proof of Claim: Consider the first step of the cascade, i.e. as it expands from 
a single node in state /. There are no removed nodes, and hence, if the initial 
node is not disconnected, the probability the contagion would reach any other 
node is strictly greater for t + . Next, consider any state JC of the graph con- 
taining nodes in S,I, and R modes (susceptible, infected and removed, resp.). 
Consider also any state JC which can be produced in one cascade event from 
JC (note that the time steps of the simulation above may each contain several 
such events). The possible differences between JC and JC 1 are a single node (1) 
changing / — > R or a node (2) changing S —> I. Observe that the probability 
of either event happening is not smaller under r + as compared to r. Since any 
cascade can be decomposed into cascade steps, the expected extent of the con- 
tagion is not smaller under r + as compared to r, In sum, the expected size of 
the contagion is greater for r + . More formally, let Xi >t be the event that the 
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contagion is in state i at time t, where a "state" contains information on which 
nodes are in each of the S, I, R modes. Consider two network states Xjj+i and 
Xk,t+i t reachable from Xij where Xk,t+t has not less nodes in level /. Under 
the measure induced by r + the probability of transitions to state X^t+i lS n °t 
smaller than under r, for all such states. By induction on t from 1 to t (t is 
time where the epidemic has no more infected nodes, i.e. nodes in level I), the 
mean extent must be not smaller for t + . 



C Simulation Methodology 

The resilience metric is most easily computed by simulation where a node is 
selected at random to be "infected", and the simulation is run until all nodes 
are in states S or R, and none is in state I. A contagion that starts at a single 
node would run for up to n steps, but usually much fewer since typically r < 1 
and/or the graph is not connected. To achieve good estimate of the average 
extent, the procedure was replicated 40 times, and then continued as long as 
necessary to achieve an error of under zblnode with a 95% confidence interval. 

For each design and configuration, the program generated 1 — 10 sample 
networks (depending on the variability characteristic of the design) of 100 nodes 
each, and computed the average objective function value. The coefficient of 
variation in the fitness of the sample networks was monitored to ensure that 
the average is a reliable measure of performance. Typically variation was < 
0.2 except near phase transitions of connectivity and percolation. In designs 
consisting of cells of size k, in order to consider a spectrum of k values some of 
which might not divide 100, the number of nodes was chosen to be either the 
largest multiple of k less than 100 or the smallest multiple larger than 100. If 
k was fractional, cell sizes were sampled from a normal distribution with mean 
k and standard deviation 0.3. In general, numerical experiments showed that 
normalization in the definitions of resilience and efficiency ensures that even 
when the number of nodes is tripled the effect of network size on fitness is very 
small for the above designs (around ±0.05). 

Optimization was performed using simple grid search without grid refine- 
ment but alternative methods (e.g. Nelder-Mead) were considered. Grid search 
was chosen despite its computational cost because it suffers no convergence 
problems even in the presence of noise (present due to variations in topology 
and contagion extent), and collects data useful for sensitivity analysis. 

An analytic computation of the cascade extent metric was investigated. It 
is possible in theory because the contagion is a Markov process with states 
in the superset of the set of nodes, 2™. Unfortunately, such a state space is 
unpractically large. When G is a tree, then an analytic expression exists^] and 
it might be feasible when the treewidth is small j8l|27]- However, in many graphs 

Specifically, the mean contagion size is 1 + x^^j^y, where Gq(x) generates the degree 
distribution and G\(x) = generates the probability of arrival to a node [271 . 



18 



below the tree approximation is not suitable. A fruitful approximate approach 
is to represent the contagion approximately as a system of differential equations 
which can be integrated numerically [TS] . These analytic possibilities were not 
pursued since the simulation approach was sufficient and could be applied to all 
graphs. 



D Analytic Results 

The information provided by simulations is valuable but limited, as simulations 
cannot be run for the entire infinity of parameter values and design configura- 
tions. Fortunately, it is possible to analytically derive the values of the resilience, 
efficiency (and hence fitness) functions for certain simple designs: the cycles and 
the stars designs. Recall that n is the number of nodes and k is the number of 
nodes per cell. For k = 1, in both designs R = 1 and W — 0. When k > 2, for 
the cycle design: 



R(n, k, t) 



W(n,k,g) 



1 



1 



n- 1 



1 - I-* 5 " 1 

2t 



1 - T 

#-1 



{k-xy 



1 Ufc + aSJ-T^ keven 

""UaEiS^ kodd 



and for the stars design: 



1 _ I 

R(n,k,r) = 1 M2 + r(fc-2)]r 



W(n,k,g) = 



n — 1 



1 - 



\ [2 + 2~ g (k-2)] . 



These expressions are not readily useful for continuous optimization since k is 
discrete, but they can be used to obtain a plot of the fitness function, and 
identify phase transitions. Thus, they help inform optimization for designs 
where no analytic expression is available. 

In the stars design, when R and W are weighted equally (r = |), fitness takes 

a relatively simple form: F = \ + [2(1 - r) + (2~ 9 - r 2 )(fc - 2)] . This 

implies that increasing cell size k, for k large, improves fitness iff 2~ 9 — r 2 > 0. 
Hence the optimal configuration has one cell (k = n), until a threshold near r = 
2^9/2 (for g = 1, approximately 0.71). This agrees with the findings in Fig. [9] 

Also, the rate of change in fitness with respect to r, 4^ = — [—2 — 2r(fc — 2)], 
is always negative, as expected on more general grounds (see Appendix, sec.[B|. 
It is linear in r (because it is a tree graph) but superlinear in k (because of the 
mutual hazard induced by adding nodes to cells.) 
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E Configuring the Optimal Design 



As t is varied, the optimal configuration changes. This section shows those 
changes in the values of the parameters k (cell size) and p (connectivity). In 
other words, it indicates how each of the designs ought to be configured to attain 
optimal fitness, as a function of resilience weighting, r, and cascade probability, 
r. 

The cell size parameter A; is non-monotonic for various designs under r < 0.5 
(Fig. |9|. For example, for the cavemen design, at low contagion risk (r < 0.1), 
k is high (comparable to the size of the network, i.e. k — ► n), then it falls to 
a small number. At high contagion risk (t > 0.6) the network is again highly 
connected again with k — > n. Thus for r — > 1, the optimal network is the 
fully-connected graph. 

In general, designs involving both the p and k parameters show an intricate 



interplay between the two (Fig. 10 1. For example, in the connected stars design 



under r < 0.5 there are two phase-transitions in connectivity p: as r increases 
at r — > t ; * w 0.1 it transitions from a connected graph to disconnected cells, and 
at r — > t* as 0.7 back to full connectivity. If r > 0.5 the second transition is 
extinguished. The data requires care to interpret: under r > 0.5 the fluctuations 
in the p in the range r £ [0.1, 0.65] of the connected stars design are noise because 
there is a single cell and a single cell leader (k — n), and so the parameter p has 
no effect. For sensitivity analysis see Appendix, section [F] 





8 1.0 



(a) r = 0.49 (b) r = 0.51 

Figure 9: Cell size k in the optimal configuration of each design. 
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tail tail 

(a) r = 0.49 (b) r = 0.51 

Figure 10: Connectivity p in the optimal configuration of each design. 

F Sensitivity Analysis 

It is desirable to determine how much variability exists within the optimal val- 
ues. One possible approach is to consider all configurations whose fitness > 0.95 
of the fitness of the optimal solution, and describe the variability in this space. 
Since in practice the space is infinite, sampling is necessary. The plots of stan- 
dard deviation within various properties of those configurations are shown in 
Figs. |ll|12|13|14|15| 

Overall, as one would expect, the properties are more variable near the 
transition point r = 0.5, as compared to r values away from r = 0.5. Moreover, 
variability is high within each design whenever the design undergoes a phase 
transition, since multiple different phases have nearly equal fitness. Designs with 
two parameters are more variable than those with a single parameter because 
because the latter can sometimes reproduce the same graphs with many different 
parameter settings - the parameters have "non-orthogonal" effects. 



21 



h Cavemen 

■ ConnStars 

■ Cliques 
t Cycles 
► G(n,p] 

» Stars 




0.0 0.2 0.4 0.6 


0.8 I. 


0.0 


0.0 0.2 0.4 0.6 
tail 


0.8 1.0 


(a) r = 0.25 






(b) r = 0.49 






— i Cavemen 
■ — ■ ConnStars 

• — • Cliques 
* — « Cycles 
♦— *G(n,p) 
» Stars 


0.8 




i — Cavemen 
■ — ■ ConnStars 

■ — ■ Cliques 
* — * Cycles 
♦— »G{n,p) 
* — ▼ Stars 






0.6 










0.4 








0.2 















(c) r = 0.51 (d) r = 0.75 

Figure 11: Standard deviation in resilience, within the top 5% of solutions. 
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Figure 12: Standard deviation in efficiency, within the top 5% of solutions. 
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Figure 14: Standard deviation in connectivity p of the top 5% of solutions. 
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